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Abstract. We consider from a general point of view the problem of determining the extinction in dense molecular 
clouds. We use a rigorous statistical approach to characterize the properties of the most widely used optical and 
infrared techniques, namely the star count and the color excess methods. We propose a new maximum-likelihood 
method that takes advantage of both star counts and star colors to provide an optimal estimate of the extinction. 
Detailed numerical simulations show that our method performs optimally under a wide range of conditions and, in 
particular, is significantly superior to the standard techniques for clouds with high column-densities and affected 
by contamination by foreground stars. 
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1. Introduction 

A detailed comprehension of the structure and physical 
properties of dark molecular clouds is a critical step to 
understand fundamental processes such as star and planet 
formation. Still, after several decades of investigations, 
very little is known about dark clouds, about their re- 
lationship to the diffuse interstellar medium, and about 
the composition and evolution of the dust grains present 
in the clouds. 

Molecular clouds are composed of approximately 99% 
molecular hydrogen and helium, but due to the absence 
of a dipole moment these molecules are virtually unde- 
tectable at the low temperature (~ 10 K) that charac- 
terize these objects. As a result, astronomers have been 
using different tracers to study the density distribution 
of molecular clouds. Historically, the first technique was 
based on a statistical analysis of the angular density o f 
stars observed through a cloud (fWolf 1923; B oklll937f) . 
This method, known as star counts, uses the dust (which 
is responsible for the extinction of light) as a tracer of H2 
and He. 

More recently, radio observations of carbon monox- 
ide (CO) and its isotopes have been used to study the 
spatial distribution and the physical conditions of molec- 
ular clouds, under the assumption that CO is a reli- 
able tracer of the matter in the cloud, i.e. that the ra- 
tio iV(CO)/(iV(H 2 ) + AT (He)) is approximately constant. 
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Radio spectroscopy techniques have provided extremely 
effective in studying the most dense regions (above 
10 22 protons cm" 2 ) and, more importantly, give dynamical 
information on the cloud structure. However, with the ad- 
vent of various dust detectors in the near- infrared (NIR) , 
far infrared, millimeter, and sub-millimeter bands, it has 
become clear that several poorly constrained processes 
(e.g., deviations from thermodynamic equilibrium, opac- 
ity variations, chemical evolution, and more importantly 
depletion of molecules) can significantly affect the results 
of analyses based on radio observations (e.g. lLada et alJ 
11994 lAlves et al.lll999t lHariunpaa et alJl2004 - 



The reddening of background stars offers a natural 
method to study the distribution of dust in molecu- 
lar clouds, and thus the hydrogen column density. This 
technique can be better applied to the infrared bands, 
which compared to optical bands are less affected by 
extinction and are less s ensitive to th e physical prop- 
erties of the dust grains l|Mathisl [19901. Before the ad- 
vent of large format array cameras, the lack of instru- 
mental sensitivity clogged infrared observations to small, 
dense clouds (e.g . Lfcjieset^.1 119801 iFrerking et alJll982t 
Ijones et al.ll 984: Casali 1986). More recently, infrared ar- 
rays have made it possible to measure thousands of stars 
and to extend th e original technique to entire molecular 
cloud complexes llLada et al.ll99lll999tlAlves et al.l200ll 
lLombardi fc Alvesll200l|r Such measurements are free of 
the complications that plague molecular-line data and per- 
mit a detailed analysis of the cloud density distribution. 
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Although the success of the NIR color excess method 
is evident, there is still a need for a deeper understanding 
of its limitations, statistical biases, and uncertainties. The 
aim of this paper is twofold: (i) to study in detail the sta- 
tistical properties of the star count and color excess meth- 
ods and (ii) to describe a new, optimal method based on 
a maximum-likelihood analysis. The method is described 
here for NIR observations, but could equally well be ap- 
plied to other infrared bands for which the extinction law 
and the intrinsic color distribution are known with good 
accuracy. The paper is organized as follows. In Sect. [21 
we introduce our formalism and consider from a statisti- 
cal point of view the extinction of stars by a foreground 
cloud. The two main techniques used to obtain extinction 
measurements, namely the star count and the NIR color 
excess methods, are discussed in Sect. [3J In Sect. 0] we 
describe the new maximum-likelihood method and show 
by means of numerical simulations that it performs excel- 
lently even in the presence of a significant contamination 
by foreground stars. Our conclusions are briefly reported 
in Sect. 03 Finally, in Appendix El we consider the sta- 
tistical properties of the median and of related estimators 
used often in infrared studies of dark clouds to remove the 
effects of foreground stars. 

2. Basic relations 

Suppose that N stars with magnitudes (on a given band) 
{m n } are observed in a given region of the sky (here- 
after we will use the hat " to denote measured quanti- 
ties). These observed magnitudes depend on the original, 
unreddened, absolute star magnitudes {M„}, on the star 
distances {D n }, on the individual extinctions {A n }, and 
on the photometric errors {e n }. In particular, we have 

rh n = M n + 5 log 10 D n - 5 + A n + e n = m n + e„ , (1) 

where the distances {D n } are taken to be expressed in 
parsecs. The quantity m„ that appears in the r.h.s. of this 
equation is the reddened magnitude of the n-th star free 
from measurement errors (i.e., the magnitude that would 
be observed in the limit of an extremely long exposure 
time). 

2.1. Single-band probability distributions 

In order to consider the most general situation in a proper 
statistical way, we introduce several probability distribu- 
tions: 

p(M,D): the probability distribution (density) of stars 
with absolute magnitude M at distance D; 

Pa{A\D): the probability distribution of extinction for ob- 
jects located at distance D, i.e. the probability of an 
extinction of A magnitudes on a star given that its 
distance is D; 

p e (fh\m): the probability distribution of photometric 
measurement errors, i.e. the probability of measuring 
a magnitude rh for a star given that its true reddened 
magnitude is m. 



These probability distributions are now defined in the 
most general way. Later, however, we will consider spe- 
cial forms of them that allow us to simplify the relevant 
equations. Note also that we take the extinction A at dis- 
tance D as a random variable. Hence, we have introduced 
above the probability distribution pa(A\D) rather than 
a (deterministic) function A(D). This general approach 
can be used to describe molecular clouds with "patchy" 
colu mn densities, which seems to be q uite common (see, 
e.g.. lLada et alJll99pJ: ICambresvlll999^ . Finally, we use a 
general form for the photometric error that includes the 
common case where the error depends on the magnitude 
of the star. Moreover, we also consider through p e (m\m) 
the case of undetected objects, i.e. the case where a star 
of true magnitude m close to the detection limit produces 
no detectable flux; in this case we write m = null. Hence, 
the quantity 

c(m) = 1 -p £ (null|m) (2) 

represents the completeness of our observations for stars 
with true magnitude m. 

We assume that both pa(A\D) and p e (m\m) are nor- 
malized to unity, i.e. 

/>oo 

/ PA (A\D)dA = l WD, (3) 

Jo 

/oo 
p e (rh\m) dm = 1 Vm . (4) 

-oo 

Moreover, we take p(M, D) to be normalized so that the 
quantity 

dN = D 2 p{M,D)dDdQdM (5) 

represents the expected number of stars located at a dis- 
tance in the range [D,D + dD] in a patch of the sky of 
solid angle dil, and with absolute magnitudes between M 
and M + dM. 

Equation can be used to obtain the probability dis- 
tribution of reddened magnitudes p m (m) . Using the pre- 
vious definitions we find 

y* oo />oo 

p m (m) = dDD 2 dA PA (A\D) 
Jo Jo 

x p(rn — 5 log 10 D + 5 — A, D) . (6) 

Finally, p m (m) can be converted into the distribution of 
observed magnitudes Pm(rh) by convolving it with the 
measurement error probability distribution p e (rh\rn): 

/>oo 

Prh{rh)= \ Pe(m\ m )Prn{m) dm . (7) 

J oo 

Because of the way p e (rh\m) is defined, Eq. Q includes 
also the case m = null. Note also that Pm(rh) is normalized 
so that pm (m) dm represents the expected angular density 
of stars with observed magnitudes on the range [rh, m + 
dm]. 

Equations © and (JJJ are basic equations of this paper. 
However, in the form they are written, they are by far too 
general to be useful in practical cases. 
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2.2. Simplifications 

In order to make use of the general relations written above 
we introduce a number of useful simplifications. 

First, we note that so far we have considered obser- 
vations carried out on a single band. In order to threat 
multi-band data, we introduce a reddening law as follows: 
for any band A, the extinction A\ is given by 



A x = k x A 



(8) 



where k\ is a constant and Ay is the extinction in the V 
band. In other words, Eq. ||SJ| states that the ratio Ax/Ax> 
between the extinction in two bands A and A' is a constant 
for a given cloud. In reality, this ratio is known to vary 
with the environment for short wavelengths, but is almost 
universal in the NIR (a nd, according to a recent work of 
llndebetouw et all I2004L for wavelengths in the range 3- 
10 /im as well). In the rest of this paper we will follow the 
usual notation and express all extinctions in terms of Ay ■ 
A second important assumption that we will use re- 
gards the functional form of p(M,D). First, we will as- 
sume that this function can be separated in M and D, 
i.e. can be written as the product of two functions, one 
involving M only, and one involving D only: p(M, D) = 
Pm(M)p(D). In practice, this assumption is equivalent to 
saying that we observe a single "population" of stars at 
all distances. Since observations show that star number 
counts are well approximated by an exponential law of 
the form p m (m) cx 10 Qm (where the exponent a ~ 0.34 is 
approximately independent of the band considered in the 
NIR), we also assume that pm(M) follows a similar law. 
We then write 



p{M,D) = v x lQ aU p{D) , 



(9) 



where v\ is a normalization factor. In the following we will 
assume that p{D), similarly to a, is independent of the 
band considered, while v\ is not. Note also that, because 
of the exponential form of this equation in M, a change 
of the normalization factor v\ is in practice equivalent to 
a change of the zero-point used for the magnitude M. 

Using Eqs. J3J and (JSJ we can slightly simplify Eq. (JBJ 
by performing the integration over D: 



Pm(m) 



ADD 



2-5a 



P(D) 



/ dA vPAv (A v \D) 
Jo 



x z/ A 10 5 "10" m 10" 



xk\Ai 



= av x lQ am / dA vPAv (A v )W- a ^ Av , (10) 



where we have defined a distance- weighted probability dis- 
tribution for Ay as 

1 f)5a poo 

p Av {Av) = / dDD 2 - 5a p(D) PAv (A v \D) , (11) 

a Jo 

and where a is a numerical factor introduced in Eq. (|10fl 
to ensure that pa v {Av) is normalized to unity: 



/>oo 

10 5Q / dDD 2 - 5a p{D) 
Jo 



(12) 



Note that pA v {Ay) has a simple interpretation, being 
the probability distribution for stars to have undergone 
a given extinction regardless of their distance. For exam- 
ple, if all stars were subject to the same extinction Ay, we 
would have pa v (A' v ) = 5{A' V — Ay). Note also that, since 
a ~ 0.34 can be taken to be approximately independent of 
the band considered, pA v {Ay) is also band-independent. 

Equation (|10f> shows an interesting property of extinc- 
tion studies: the probability distribution p m {m) depends 
on pA v (Ay\D) only through pA v (Ay). This point is par- 
ticularly important since it shows an intrinsic limitation 
of extinction measurements. Indeed, all observables are 
derived from pm(m), the distribution of observed magni- 
tudes, and this function depends only on p m (m) (other 
than observational "limiting factors" such as p e (m\m) or 
c(m)). Since, as noted above, p m (jn) does not depend di- 
rectly on pA v (Ay\D), any estimator based on observed 
magnitudes only 1 will not provide any information on 
PA V (Ay\D) directly but (in the best case scenario) only 
on pA v (Ay). Since pA v (Ay) is a sort of convolution of 
PA V (Ay |D), in general it is not possible to have a complete 
knowledge on pA v {Ay\D), not even if we know the star 
density distribution p(D); this limitation, among other 
things, prevents us from gathering information on the 
three-dimensional structure of a molecular cloud. 

A case where, instead, the function pa v (Ay\D) can be 
recovered is a deterministic model for the extinction, i.e. 
a cloud complex whose extinction Ay depends directly on 
the distance D (and, thus, is not a random variable). In 
this case we have 

PAv (Ay\D) = 5(Ay - Ay (£>)) . (13) 

The extinction Ay{D) has a simple relationship with the 
integrated gas column density p E&s '- 



Av(D) = i J\d' p sas (D') , 



(14) 



where (3 ~ 2 x 10 21 cm" 2 mag - 1 is the ratio between gas 
densi ty and V-band extinction l|Lillevlll955t iBohlin et alJ 
Il978j) . Suppose now that we carry out observations on the 
cloud and measure pa v {Ay). In order to show that we can 
recover the structure of the cloud, let us call D(Ay) the 
inverse of the function Ay(D) defined in Eq. ifTl^l . i.e. the 
distance at which the integrated extinction is Ay. Then 
we have 



PA V (A\ 



10 



5o 



D(Ay)] 2 5a p(D(Ay))D'(Ay) 



10 5q p(D{Ay)) d 



3 - 5a dA x 



[-D(Ay)] 



3-5a 



(15) 



1 Through this paper we assume that the distance of individ- 
ual stars is not an observable; if, instead, the distance of each 
star can be estimated, then in principle one can also directly 
measure pa v (Av\D). 
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In other words, we have obtained a differential equation in 
[D{A V )\ . Ifp(D) is known, using the boundary con- 
dition -D(O) =0 we can in principle solve this differential 
equation and obtain D(Ay) which, in turn, can be in- 
verted into Ay{D) and p gas {D). It is superfluous to stress 
that this process in reality is far from being trivial. 

Finally, we consider a very simple deterministic model 
for pa x {A\\D) that describes a thin cloud at a distance 
D with uniform column density Ay: 



PAv (A' V \D) = 



S(A'y) 

8{A'y - Ay) 



if D < D Q , 
otherwise . 



(16) 



The related distribution pA v (Ay) takes then the simple 
form 

p Av {A'y) = fS(A'y) + (1 - f)6(A' v - Ay) , (17) 
where / is a real number in the range [0, 1] given by 



/ 



-Do 



dDD 2 ~ 5a p{D) 



dD D 2 ~ 5a p(D) 



(18) 



Hence, / represents the fraction of foreground stars (i.e. 
stars at distance D < Dq) present in any apparent mag- 
nitude bin in regions with negligible extinction. In the 
following, we will refer to the simple case described in 
Eqs. Ijl6|l and Ijl7|) as "thin cloud approximation" (the 
term is borrowed from gravitational lensing theory). 

2.3. The completeness function 

Above, in Eq. we introduced the completeness func- 
tion c(m), which gives the probability to detect a star of 
magnitude to. Typically, c(m) is close to unity for bright 
stars, and vanishes for very faint stars; note however that 
c(to) might be smaller than unity at low to in crowded 
fields or close to bright objects. The completeness func- 
tion c(m) enters naturally in the definition of the error 
probability distribution p e (m\m), which can be written as 



p e (m\m) = 



c(m)p e (m\m) if to ^ null , 



1 — c(m) 



if to = null . 



(19) 



This representation reflects the measurement process: for 
any star, there is first a random process that "decides" 
whether the object is detected or not (with probability 
fixed by c(m)); then, if the star is detected, there is a 
second random process that generates the observed mag- 
nitude to according to p e (m\m). Note also that this latter 
probability distribution is normalized [cf. Eq. 

In order to carry out some simplifications (see below 
Sect. 14. 2|) . we also consider a different representation of 
the completeness function in which the order of the two 
random processes described above is swapped: this leads 
to a completeness function c(to) defined in terms of the 
observed magnitude. In other words, we first generate for 



every star the "observed" magnitude to according to a 
probability distribution p e (to | to), and then decide whether 
the star is really observed using a second random process 
controlled by the completeness function c(m). This im- 
plies, among other things, a modification of Eq. JJJ), that 
now becomes 



Pmfm) = c(to) / p e (rh\m)p m (m) dm 



(20) 



This equation is valid for detected stars, i.e. if to 7^ null. 
Since c(to) represents the probability that a star with mea- 
sured magnitude to be detected, we evaluate the proba- 
bility that the star is not detected as 

/oo />oo 
dm [l — c(to)] / dmp e (m\m)p m (m) . 
-00 J —00 

(21) 

An important point to observe here is the fact that we have 
modified Eq. ® into Eqs. $2$ and J2IJ by defining two 
functions, the new error distribution p e (m\m) and the new 
completeness function c(m), which replace, respectively, 
p e (m\m) and c(to). Indeed, both these couples of functions 
are intimately related, and this was implicitly taken into 
account above by using a single distribution p e (m\m) to 
describe both the photometric errors and the completeness 
of the observations. Interestingly, it is possible to find a 
relationship between the couple p e (m\m), c(m) and the 
couple p e (rh\m), c(m) by requiring that, for any reddened 
magnitude probability distribution p m (m), the observed 
magnitude probability distribution j)„ (to) evaluated using 
Eq. (0), or using Eqs. (|2"0"|l and f2"T|) . agrees. We find then 

c(to)p £ (to|to) = c(to)p c (to|to) , (22) 

/oo 
p e (m\m) [l — c(m)] dm . (23) 

Since p f {m\m) is normalized to unity, if we integrate 
Eq. l(2"2"|l over to and substitute the result into Eq. (|2"3"|) . 

we find 



p e (m\m) dm = 1 , (24) 
i.e. p e is also normalized to unity. In summary we have: 

/oo 
c(to)p £ (to|to) dm , (25) 
-00 

c(m)p e (m\m) 



p e (m\m) 



c(m)p e (m\m) dm 



(26) 



Since Eq. I|25|l involves a convolution, it is in general not 
possible to invert it and express c(to) in terms of c(m). 
Note that Eq. 126|l is essentially Bayes' theorem applied 
to p t {m\m). 

Although Eqs. I|25|) and l|26|) show that the description 
of the completeness c(to) in terms of the true reddened 
magnitude to is more general than the description c(m) 
in terms of the observed magnitude to, we argue that the 
latter is more practical to use: 
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Operationally, the completeness function is usually 
evaluated from the observed magnitudes by compar- 
ing the expected number of stars in a given magnitude 
bin with the number of stars detected in the same bin. 
Often, it can be sensible to use some a posteriori cuts 
in the star catalog. For example, if we know that ob- 
servations of faint stars are particularly unreliable, we 
can just discard all objects with observed magnitude 
to larger than a given threshold m llm . This cut could 
be easily described in terms of c(m): 



, , , . 1 if TO < TO llm , 

c(m) = H m lim -m)= (27) 

otherwise . 



— The use of c(m) allows us to evaluate analytically 
Pmijn) for some special simple probability distribu- 
tions. This point is particularly valuable for the practi- 
cal applications that we will describe below in Sect. 01 

2.4. Multi-band probability distributions 

So far we have focused our analysis on single-band mea- 
surements. In order to extend the discussion to observa- 
tions carried out in different bands, we need to generalize 
the relevant probability distributions. 

We denote M = {Mi, M 2 , . . . , M A } the magnitudes 
of a star in A different bands; in general, we use bold 
symbols such as k to indicate quantities that have different 
values in the various bands. We will threat these as vector 
quantities; in this way, e.g., the generalization of Eq. Q 
can written as 

rh n = M n + 5 log 10 D n - 5 + kA v + e„ = m n + e„ . (28) 

Since we are now working with observations in differ- 
ent bands, we need also to generalize p(M, D). We define 
thus p(M, D), the probability to have a star with absolute 
magnitudes M at distance D from us. Using this distri- 
bution we can rewrite Eq. jBJ as 

/>oo />oc 

Pm (m)= ADD 2 AA vPAv (A v \D) 
Jo Jo 

x p(m - 5 log 10 D + 5 - kA v .D) . 

(29) 

The generalization of Eq. JJJ) is also simple. Since 
photometric measurements in different bands are inde- 
pendent, we have to perform A different convolutions 
with the measurement error probability distributions 
{p ex (rh\\m\)} corresponding to the various bands: 



f A 



(30) 



Note that we consider a star detected if it is detected in 
at least one band. 

The integral of Pm (in) over all admissible values of rh 
gives the expected local density of stars a: 



a = / p rfl (m)d A m 

J(RU{null}) A \{null} 



(31) 



Since a gives the normalization of pm(rn), the conditional 
probability that a star with magnitudes rh be observed 
given the fact that the star is detected is Pm(m)/(T. 

When using the alternate completeness functions 
{c\(m\)} expressed in terms of the observed magnitudes 
{toa}, Eq. H3()J) can be rewritten as 

f A 

Prh(rh) = / d'W Yl ~ to a)ca(to' a ) 

J \ i 



A=l 



+ 5 (rax -mill)(l - c A (m A )) 



d\np m (m) Y[ Pe A (TO A l m A) 



(32) 



The combination of delta distributions inside the brackets 
in this equation ensures that rh! x — rh\ if rh\ ^= null, and 
that we integrate over (l — c\(rh' b )) (the probability of not 
detecting the star) if fh\ = null; the last integration is the 
usual convolution with the measurement errors. Similarly, 
Eq. H31J) can also be written as 



a= /" dV 1- f[(l-cx(m x ) 

A 

d'Vnp^m) Y[ Pe x (mx\m\) 



(33) 



A=l 



2.5. Further simplifications 

As for the single-band case, we consider a number of re- 
alistic and useful simplifications that will allow us to take 
advantage of the formalism introduced above in practical 
cases. 

First, we still take p(M, D) to be separable into 
p(M,D) = pt l [(M)p(D). Furthermore, we adopt for 
Pm(M) a simple functional form, which is sufficiently ac- 
curate for our purposes. Since we know that p m (m) cx 
10 Qm with a approximately independent of the band, and 
since stars appear to have a limited scatter in their NIR 
colors, we write 



p M (M) = ^10 QMl 

(M a - Mi - X a)Cj(M b - Mr - X b) 



x exp 



(34) 



where we have used Einstein's convection on repeated in- 
dexes. In other words, we suppose that Mi is exponen- 
tially distributed, and that star colors are distributed as 
Gaussian random variables with averages (M a —Mi) = \a\ 
C represents the covariance of these colors. This distribu- 
tion can be rewritten in a more convenient way as (see 
also below Sect. 14. 211 



p M (M) = exp 



M T PM + 2Q T M + R 



(35) 
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The form l|34[l for pm{M) is particularly convenient for 
several reasons. First, we can greatly simplify Eq. (|2"9")l and 
obtain a result similar to Eq. (|1(J|> : 



kx 



p m {m) = a 



/ dA v p Av (A v )p M (m - kAi 
Jo 



(36) 



where pA v {Ay) and a are still given by Eqs. and 
l(T2*|) . Again, we observe that the fact that p m (m) de- 
pends only on the distance- weighted PA v {Ay), implies 
that pa v (A v \D) is not an observable. Moreover, the forms 
l|34|) and l|35|l are unmodified by a reddening M i— > m = 
M + 5 log 10 D — 5 + kAy and by a convolution with 
Gaussian photometric errors (see below Sect. I4.2|l . 



3. Extinction measurements 

We will describe in this section the star counts and color 
excess methods in order to better describe the advantages 
and limitations of them. 



3.1. Star counts 

In the 18th century the English astronomer William 
Herschel noted that some regions presented few stars and, 
following Newton's idea of a perfectly transparent space, 
interpreted these "holes in the sky" as a real lack of stars. 
This miscon ception surviv ed the discovery of individual 
dark clouds l|Barnardlll919l) and had serious consequences 
on Shapley's calibration of the distance scale for Cepheids. 
The "disco very" of du st in our Galaxy took place only in 
1930 when iTrumnlerl showed its importance in dimming 
the light coming from distant open clusters. Finally, in 
recent decades dust has no longer been seen only as an- 
noying "fog" obscuring the light of background sources, 
and has been shown to have a tremendous impact on the 
evolution of galaxie s and on the formatio n of stars and 
stellar systems (see iLi fc Greenberel 120031 for a detailed 
historical review). 

It has long been recognized <)WolJll92^ that measure- 
ments of the local density of stars in different regions of 
the sky can be used to map the extinction. The original 
technique consisted in comparing the number of stars in 
magnitude bins in regions subject to extinction with the 
number of stars in regions where the extinction is (sup- 
posedly) n egligible. This technique was then improved by 
iBokl l(l956h which suggested to use counts up to a limiting 
magnitude to reduce the error. In the past, the star count 
technique was mainly applied to o ptical data (typ i cally vi- 
sually inspected Smith plates: e.g.lDickmadll978tlM"attilal 



Il98fit lAndreazza fcVilas-Boaslll996ft . In the last decade, 
however, near-infrared digital data have been available, 
and the star count tech nique has been finally applied 
to NIR observations (e.g. ICambresv et al.lll997|) . In this 
respect, a key role has been played by large NIR sur- 
veys such as the Two Micron All Sky Survey (2MASS; 
iKle inm ann et alJ 1994) and the Deep NIR southern Sky 
Survey (DENIS: lEnchtein et al.lll997ft . 



(Rieke & Lebofsky 1985) (van de Hulst 1946) 



J 
H 

K 
L 

M 
N 



0.282 
0.175 
0.112 
0.058 
0.023 
0.052 



0.246 
0.155 
0.089 
0.045 
0.033 
0.013 



Table 1. Th e extinction law in differ ent i nfrared bands 
(take n from iRieke fc Lebofskvl Il985l and Ivan de Hulstl 
119461) . 



The star count method is easily described using our 
notation. We first note that, as for multi-band probability 
distributions, the integral of Pm{m) over m gives the local 
density of stars a [cf. Eq. (|31|l ] : 



Prh{m) dm 



diiic(m) J dm p m (m)p e (m\m) . (37) 

Inserting here Eq. I|1L)|I . we immediately obtain 

o = <J (Q) J c\A v pA v (A v )\Q- a ^ Av , (38) 

where £T<°) iS the average density expected in absence of 
extinction: 



r (0) 



av\ I diiic(m) / dm 10 am p e (m|m) 



(39) 



Equation (|38|l shows that the ratio a/a^ between the 
average densities expected in presence and in absence of 
extinction is simply related to pA v (Ay). 

Clearly, a single measurement of a/a^' cannot be used 
to derive the distribution pA v {Ay). However, in the sim- 
plest case where all stars are background to a thin cloud, 
so that pa v {A' v ) = 5(A V 
lation 



Ay), we find the classical re- 



er 



10 



-ak\Av 



(40) 



which can be immediately inverted to obtain Ay (Be... 

Operationally, both densities a and ct^ ^ are mea- 
sured by dividing the number of stars observed against the 
cloud and in a control field by the angular size of the re- 
gions considered. Hence, the measurement errors on these 
quantities are due to the randomness on the local number 
of stars. If we ignore the correlation on the star positions, 
and thus assume that these a re a homogeneous Poisson 
process (see, e.g.. lCressielll993]) . then the local number of 
stars follows a simple Poisson distribution. 

Although the estimator (|4U|I can be applied to thin 
clouds only, in principle more general situations can also 
be handled by taking advantage of the particular form of 
Eq. H38[) . We first observe that the integral appearing in 
the r.h.s. of this equation is reminiscent of a Laplace's 
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transform. Given a real function f(x) defined for non- 
negative values x, its Laplace's transform is defined as 



C[f](k) 



f(x) exp(— kx) 



(41) 



An important property of Laplace's transform is that it 
can be inverted: in other words, it is possible to obtain 
fix) provided one knows C[f](k) for any non-negative 
k. Using the notation just introduced, we can rewrite 
Eq. ® as 

a = a^C[p Av }{ak x \n 10) . (42) 

Hence, if we were able to measure the ratio a/a^ for all 
positive values of the product ak\ , we could in principle 
derive pa v - As indicated above, the constant a ~ 0.34 is 
approximately independent of the band considered; how- 
ever, k\ strongly depends on the band used to carry out 
the observations (see Table E). Hence, the product ak\ 
that appears in Eq. (|42|) can be varied among a limited 
set of values. In conclusion, although complete knowledge 
of pa v is clearly impossible, we can still use multiband ob- 
servations to investigate pa v in more complex situations 
than the one considered in Eq. 1)41) |). 

Let us consider, for example, the case of the thin cloud 
approximation described in Eq. JT7J|. Since, in general, 
the fraction / of foreground stars is not known, we want 
to estimate both Ay and / in a given patch of the sky. 
[Although / can be taken to be constant in many cases, 
changes on this quantity have to be expected for different 
regions of large cloud complexes because of the geometry 
of the cloud and of changes of star densities due to the 
Galactic structure.] For this, we insert Eq. I|17(l in Eq. I|38|) . 
thus obtaining a simple generalization of Eq. I|40|) : 
a 
1M 



/ + (1-/)10 



-ak\A\ 



(43) 



Hence, we can in principle use two different measurements 
of a in two different bands to deduce both / and Ay on 
the region considered. In practice, it is preferable to follow 
the approach described below in Sect.^J 

It is interesting to investigate in more detail the esti- 
mator derived from Eq. (|43|l . i.e. 



Av = 7T lo Sio 

ak x 



N-Atr^f 



.Ar(°)(l-/)J ' (44) 

where N is the number of stars found in the region inves- 
tigated, and A is its area. Assuming that is measured 
without significant errors (this is possible by using a large 
control field), we can deduce the expected error on Ay by 
noting that N follows a Poisson distribution with average 
[cf. Eq. g3] 



(N) = Aa<® [/+(!- /)10" 



(45) 



Using a first order approximation (valid for small relative 
errors) we find 



Var(i y ) = 



1 



1 



ak x In 10 (N) - AfcrW 



(TV). (46) 




Ay 

Fig. 1. The cumulative probability distribution for 
Ay. The various curves are relative to a true ex- 
tinction Ay = [0,2,4,6,8,10] (from the left to 
the right); the average values measured are instead 
(Ay) = [0.24,2.33,4.47,6.64,8.79,10.76] (the scatters 
around these values ranges from 1.9 to 4.3). In all cases we 
assumed a = 0.34, k x = 0.175, / = 0.1, and A<7 {0) = 20. 
Along the curves the dots mark the actual possible val- 
ues of Ay that can be measured for different values of the 
observed number of stars N . 



i 



T 



~i — i — i — r 



"i — i — i — r 




ArW = 10 

- At(°) = 20 

- Ax(°> = 50 



_J I I I I I I I L_ 



10 

A v 



15 



20 



Fig. 2. The bias (thick lines) and error (thin lines) on Ay 
evaluated using Eq. (|44|l for different values of Aa ( °> . For 
this figure we used the same parameters as in Fig^ i.e. 
a = 0.34, k x = 0.175, and / = 0.1. 

In case of vanishing / (i.e., if all stars are background to 
the cloud), this expression takes a simpler form 



11.4 




akx In 10 



Err(Ay) = JWax(A 



where the last expression is valid for the K band (cf. 
Tab.UJ. 

Further statistical properties of the estimator l|44l) can 
be better evaluated using numerical methods. Figure Q 
shows, as an example, the expected cumulative probabil- 
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ity distributions for Ay for some typical cases. As ex- 
pected, the cumulative distributions reach 0.5 around the 
true value for Ay, but the expected measurement errors 
can be very large, especially at high column densities. 
The numerical analysis of Eq. (|44|) also shows that the 
estimator is significantly biased. In general, in the limit 
a A ^> 1, the estimator is biased toward large values of 
Ay, i.e. (Ay) > Ay, the opposite happens for a A ~ I 
(see Fig. |2J . This change in behavior can be understood 
by observing that, if Ay is large, then Aa is small and 
thus we simply do not have enough background stars to 
probe high column densities. 

Interestingly, Eq. (|46() is a reasonably good approxi- 
mation of the true variance of Ay . Typically, this approx- 
imation slightly underestimates the true variance of Ay, 
but again the opposite happens for low values of aA. 

3.2. Near-infrared color excess (NICE and NICER) 

The dust present in molecular clouds produces differ- 
ent amounts of obscuration in different color bands (see 
Table 0, so that background stars appear reddened. 
Hence, the color excess in the red part of the spectrum 
of background stars can be used to measure the extinc- 
tion along the line of sight. 

Historically, the color excess technique has been im- 
peded by the limited instrumental sensitivity to small re- 
gions ijjones et al.lll980t iFrerking et alJll982t Ijones et al.1 
However, in the early 1990s, with the advent of 
near-infrared arrays, it became possible to accurately mea- 
sure the NIR magnitudes of many stars from single point- 
in g observatio ns. This new technology was first exploited 
bv lLada et al.l ( 19941) , and since then h as been successfully 
applied to many molecular clouds ( e.g. lHorner et alll997t 
lLada et al.lll99a lAlves et alJl200l|) . 

Lada's technique (called "near-infrared color excess" 
or Nice) is based on measurements of a NIR color (e.g., 
H — K) of many stars. Since stars have relatively well 
defined colors in the infrared, a significant intervening ex- 
tinction can be detected as a reddening. Note that a key 
point of the Nice method (and of similar methods based 
on the reddening of stars; see, e.g.. lSchultheis et al.lll999T) 
is the assumption that all stars belong to a homogeneous 
population. 

The Nice method can be quantitatively described us- 
ing the notation introduced so far. In particular, from 
Eq. we have 



Pm{m) = p$(m - kA 



(48) 



where we have denoted pm\m) the l.h.s. of Eq. I|29|) for 
Ay = 0. Naively, Eq. (|4"5|) can be used to obtain a simple 
estimate of the intervening infrared extinction Ay. For 
example, we find 



\(o) 



kA\ 



(49) 



where again we used the superscript (0) to denote quan- 
tities measured in a control region where Ay = 0. 



Equation (|49l) is better rewritten in terms of star colors. 
Calling x\ = m x — m i an d k\ = k\ — k\, we have 



(x) = (x) (0) 



k-Ai 



(50) 



This equation, applied to a single color, is essentially the 
Nice technique. More precisely, this technique uses the 
simple average of a set of N angularly close stars to eval- 
uate the column density: 



Ay = — 



1 Xr, 



X. 



(0) 



N ^ 

71 = 1 



(51) 



where x^ is an estimate of (x)^ ' (obtained, e.g., by mea- 
suring the star colors on a control field where presumably 
Ay ~ 0). As an example, consider the Nice method ap- 
plied to the x = H — K color. Since stars have a typical 
scatter of 0.09 mag in this color, we expect an error on Ay 
of Err(iy) ~ 1.4/VA 7 - Hence, even in the presence of sig- 
nificant photometric errors, the Nice method gives signif- 
icantly more accurate results than the star count method 

[cf. Eq. gZJ|]. 

As shown bv lLombardi fc Alvesl l)200l|) , one can indeed 
take full advantage of observations carried out in differ- 
ent bands to obtain more accurate column density mea- 
surements. The improved technique, called Nicer (Nice 
Revised) optimally balances the information from different 
bands and different stars. As a by-product of the analysis, 
Nicer also allows us to evaluate the expected error on the 
column density map, which is useful to estimate the sig- 
nificance on the detection of substructures and cores. The 
Nicer technique can be described using the following sim- 
ple argument. Equation 150JI written above can be taken 
to be a system of (A — 1) equations to be approximately 
solved for Ay, the approximation being made necessary 
because we can only measure (x) and (x) with limited 
accuracy. The "best" solution for Ay can been obtained 
by minimizing the chi-square quantity 



E 

71 = 1 



[^-X^-KAyfiC + Ey^Xn-X^-KAy] , 

(52) 

Consistently with the notation used above in Eq. I|34(l , we 
called C the covariance matrix of the star colors; moreover, 
the symbol E was used to denote the covariance matrix of 
measurement errors [the two covariance matrices have to 
be added up in Eq. I|52|l in order to properly estimate the 
expected scatter on star colors]. The best estimate of Ay, 
obtained by minimizing the x 2 of Eq. i|52|) , is precisely the 
Nicer estimator. 

Both the Nice and Nicer estimators appear to be 
unbiased provided that: (i) there are no foreground stars 
and (ii) the measured colors Xn are unbiased estimates of 
the true colors Xn- 

In reality, even if the two conditions considered above 
are satisfied, both color excess methods can still be biased 
because of selection effects introduced by the completeness 
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0.15 



"5 




0.05 



Fig. 3. The bias for the Nice method due to the different 
completeness at low and high column densities. This plot 
is evaluated by generating 100 000 stars for various values 
of Ay (marked as dots). Note that, as described in the 
text, for Ay ~ 6.7 we observe a rapid increase in the bias 
due to the change between a selection in the K band and 
a selection in the H band (see Fig. 0J. The small scale 
oscillations on the plot are due to numeric effects. 



A v = 12 



Fig. 4. A graphic explanation of the bias plotted in Fig. El 
At low column densities, we select stars mainly on their 
K magnitude, while at high column densities we mainly 
select using their H magnitude. As a result, the average 
intrinsic color (dotted line) of the observed stars (gray 
bands) changes toward lower H — K values. 

function. To understand this point let us make a simple 
example. Suppose that we carry out our observations in 
two bands, Ai and A2, and that both completeness func- 
tions c\(m\) are not vanishing only on a narrow magni- 
tude range. In this case we would always have x — 
(because a star is observed in both bands only if mi and 
m,2 are inside the narrow detection window) , and thus we 
would always measure Ay ~ 0, independently of the real 
column density. Although unrealistic, this example shows 



20 



10 




Fig. 5. The bias for the Nice method due to the contam- 
ination by foreground stars. Solid lines represents, for dif- 
ferent fractions of foreground stars, the median of the dis- 
tribution of the measured column densities. Dashed lines, 
instead show the expectation value for the median of Ay 
obtained from the measurements of 15 reddenings. 

that we must expect a bias for the Nice method; the ar- 
gument for the Nicer method is similar and leads to the 
same conclusion. 

The bias of these two methods depends on the details 
of the probability distribution pm{M) and of the com- 
pleteness functions c\{m\). However, as an example, we 
evaluated the expected bias for various values of the col- 
umn density and for the typical probability distributions 
that we expect for the 2MASS catalog. As shown in Fig.|3 
the bias in the case considered appears to be limited, be- 
low 0.2 in Ay, and has a characteristic shape: it vanishes 
for Ay = 0, increases quickly for Ay ~ 7, and finally 
saturates for Ay > 11. This behavior has a simple expla- 
nation: 

— Since the colors of unreddened stars are evaluated us- 
ing a control field (where supposedly Ay — 0), the bias 
has to vanish for Ay — 0. 

— The general trend of bias on Ay can be understood 
with the help of Fig. 0| At low Ay, stars that are ob- 
served in the K band are almost certainly also observed 
on the H band, because of the values of the limiting 
magnitudes (approximately 14.9 in H and 14.3 in K) 
and of the average star colors ((H — K) = 0.18). When, 
instead, the reddening is large, we have the opposite 
situation (because K is less affected by reddening than 
H). This different selection at different column densi- 
ties is the source of the observed bias. 

— Finally, at large column densities an asymptotic value 
is reached because now only the H band is used to 
select stars. 

A more serious problem is related to the contamina- 
tion by foreground stars, which can strongly bias our re- 
sults in the direction of lower column densities. For regions 
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Fig. 6. The effect of a foreground star contamination 
/ = 0.1 on the distribution of column densities for differ- 
ent values of Ay. From the bottom to the top we show, for 
Ay £ {0, 10, 20, 30}, both the distribution of background 
stars (solid plot, gray filled), and foreground stars (dashed 
plot). Both distributions are taken to be Gaussian, the 
first centered on Ay, and the second on 0. As the col- 
umn density increases, the relative contribution of fore- 
ground stars also increases because of selection effects. At 
Ay ~ 16, they become predominant and the median of the 
whole distribution quickly moves toward the foreground 
distribution. 

with low extinction, where the expected number of fore- 
ground stars is small compared to the expected number of 
background stars, foreground star contamination is usu- 
ally reduced by using a median estimator for Ay, i.e. by 
averaging the individual column densities measured in the 
direction of eac h star with the median instead of the sim- 
plemean (e.g. ICambresv et alJ l2002t lLombardi fc Alvesl 
200 ij). As shown in Fig. this simple technique is very 
effective and leads to almost unbiased results at low col- 
umn densities (see Appendix lAl for a statistical discussion 
on the median estimator). However, for a given fraction of 
foreground stars / (which, as described above, is evaluated 
in regions with negligible extinction) there is a threshold 
for Ay after which the cloud extinction makes the ex- 
pected number of background stars smaller than the ex- 
pected number of foreground stars. This threshold can be 
evaluated using Eq. Ij43(l and is 



A 1 1 

Ay = —r- logic — r~ 



(53) 



Because of the way the median estimator is defined, we 
observe in Fig. [S] an abrupt change on the measured col- 
umn density Ay close to this value (see also Fig. E}. Note 
that the relatively smooth transition observed in Fig. [S] 
(solid lines) is due to the intrinsic scatter in the star col- 
ors; indeed, in absence of any scatter, we would observe 
an "instantaneous" change from (Ay) = Ay at low col- 
umn densities to (Ay) = at high column densities. In 
Fig. El we also plot the expectation value of a more inter- 



esting quantity, the median over N = 15 measured col- 
umn densities. This quantity differs from the median over 
the whole distribution because of the statistical variations 
on the local number of foreground stars. In other words, 
since N — 15 is a relatively small number, in different 
realizations of our simulations we can have a significantly 
different number of foreground stars. For example, even at 
low column density, we can have a large fraction of fore- 
ground stars; similarly, even at very large column density, 
there is a finite probability to have the majority of stars 
in the background. As a result, the dashed line in Fig. [5] 
has a much smoother transition around the value given 
by Eq. (|53() . This is at the same time a good and a bad 
news. From one side, this means that we can be signifi- 
cantly biased for values of Ay smaller than the threshold 
value of Eq. (|53|l : from the other side, this also means that 
we can still partially make use of the median estimator at 
relatively large column densities. 

For regions with very high column density, it is nor- 
mally quite easy to identify and remove foreground stars 
because of their anomalous color s with respect to t he red- 
dened backg round stars (see, e.g.jAlves et all200l|) . Some 
authors fe.g. lCambresv et"al . 2002) make use of this infor- 
mation to also remove stars in less dense regions using the 
following strategy. The angular density <7f of foreground 
stars is determined using dense regions (where the fore- 
ground/background identification is easy); then, for any 
region of the cloud, the k bluer stars are thrown away, 
where Nt = o^A is the expected number of foreground 
stars in the analyzed region (deduced from the foreground 
density measured in the dense regions). This technique is 
quite simple and reasonably effective, but unfortunately 
introduces a bias at low extinctions. Consider, indeed, 
the limiting case where we have a negligible extinction 
Ay ~ 0. The distinction between foreground and back- 
ground stars is in this situation ambiguous, and thus the 
Nf bluer objects will likely include also some background 
stars. Hence, the results will be biased toward higher col- 
umn density. The exact evaluation of the bias is non triv- 
ial, but can be carried out using the techniques described 
in App. Here we report only an approximated result 
valid for Ay ~ [see Eq. (|A.15I) ]. 



(Ax 



27rErr(A v 



(54) 



where Err(^ly) is the average error of the measured ex- 
tinction from a single star. Note that the result given in 
Eq. H54f) can be taken to be an upper limit for the bias, 
since we expect this to decrease at high column densities, 
where the identification of foreground stars is secure. 

4. A maximum-likelihood approach 

From the discussion above, it is clear that both the star 
count and the Nice(r) methods can produce unsatisfac- 
tory results. On one hand, the star count technique has a 
low signal-to-noise ratio and produces significantly biased 
results at high column densities (see Fig. EJ) . On the other 
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hand, the color excess technique is more sensitive and has 
a smaller bias, but can be severely affected by the con- 
tamination of foreground stars, especially for large values 
of Ay. 

As pointed out bv lCambresv et al.l<l2002ft . we can take 
advantage of the different behavior of the star count and 
the color excess techniques in the various column density 
regimes to partially solve the problem o f the contamina- 
tion by foreground stars. In particular, ICambresv et al] 
note that it is better to use the color excess method at 
low column densities because of its higher sensitivity, and 
to switch to the star count method at very large column 
densities where the Nice method is completely unreliable. 
The optimal "turning point" where we need to change the 
technique can be evaluated empirically by comparing the 
individual results of the two methods at different column 
densities. 

The solution proposed bv lCambresv et all has the sig- 
nificant advantage of being relatively simple to implement 
and reasonably effective, but clearly is suboptimal in many 
respects: 

— The choice of the "turning point" and of the matching 
strategy is to some extent arbitrary. 

— The overall estimate of Ay still remains significantly 
biased at high column densities because at large Ay 
the star counts are used (and this method has a non- 
negligible bias, see Fig.0). 

— The density of foreground stars must still be deter- 
mined separately and is taken to be constant on the 
whole field. 

Using the theoretical framework developed so far, it is 
possible to design and implement an efficient maximum- 
likelihood approach to the problem. 

4.1. Likelihood 

Suppose again that in a region of the sky of area A we 
observe in various bands N stars with magnitudes {rh n }. 
Assuming that the area of the sky is small enough so that 
there arc no significant changes on the relevant parameters 
of the problem (such as the unreddened density , local 
expected density a, or the reddening probability distribu- 
tion pa v (Ay)), then we can easily evaluate the joint prob- 
ability distribution for such a star configuration. First, we 
note that the number of stars inside the region follows a 
Poisson distribution with average Aa: 



p N (N)=e 



-Aa 



{Act) 
N\ 



N 



(55) 



The joint star probability distribution, i.e. the likelihood, 
is given by 



N 



L({rh n }) =p N (N)H 



Prh(m r< 



a 



e- A °A ! 

w 



(56) 



Note that L depends on unknown quantities, such as 
PA v (Ay), through pjy(N), pf n (rh), and a: hence, assum- 
ing that there is no prior for these unknown quantities, 
we can obtain an estimate of them by maximizing L or, 
equivalently, InL. In the following subsections we will in- 
vestigate in more detail this maximum-likelihood estima- 
tor. 

4.2. Implementation 

We implemented the maximum-likelihood estimator using 
the simplification described above. In particular, we used 
the forms and (|33|) for the source magnitude proba- 
bility distribution pm{M). A simple calculation gives the 
following relationships between the coefficients of Eq. I|34|) 
and Eq. Ipl5|) : 

Pab = C~t - 6 la ^ C a'b - S lb ^2 C ab> 
a' b' 

+ 5 la S lb £ , (57) 

a/, b' 

Q a = -6 al a In 10 - C-Jxb + S la ]T C^Xb , (58) 

a' 

R=-2\nn + C- b 1 XaX b ■ (59) 

One of the advantages of the quadratic expression l|35[) 
is that we can write the effects of reddening as a simple 
transformation of the three quantities P, Q, and R. In 
particular, the transformation M i— > M + kAy can be 
rewritten as 



P i 
R i 



P 
R 



Q^Q- AyPk 
2A v Q T k 



(60) 



Hence, reddening does not change the functional form of 
the probability distribution (|33|l but only the three pa- 
rameters involved. 

For the following discussion, it is also convenient to in- 
troduce a new vector fi = (M, 1) = (Mi, Ma, . . . , Ma, 1), 
and a (A + 1) x (A + 1) matrix S defined as 



Mi ••• Pia Qx\ 



s- 1 = 



p 



Pm 



Paa 
Qa 



Qa 
R) 



(61) 



Then we can write the quadratic form appearing in 
Eq. H^5|l simply as 



p M (M) = exp 



(62) 



More importantly, the action of measurement errors can 
be described as simple transformations of S, provided that 
the measurement error on each band can be described as a 
simple Gaussian with vanishing average and variance v. In 
other words, assuming that the errors can be represented 
as a convolution with the Gaussian kernel 

A 



1 



(27T)A/2ntl«A 



exp 



A=l 



^ 2v\ 



(63) 
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then the convolution pm * £ can be written as 



(p M * E)(m) 



detS 
detT 



exp 



(64) 



where T = 5 + diag(uf . . . ,v\, 0). 

In conclusion, the simple implementation of the 
maximum-likelihood method considered here can be sum- 
marized in the following items: 

1. The quantities P, Q, and R appearing in Eq. i|35|) arc 
determined in a control field free from any extinction. 
Similarly, for each band the photometric errors, i.e. the 
functions p e \(rh\\m\), are calculated. 

2. A cloud model (for example, the thin cloud approx- 
imation) is chosen, and the relevant parameters (for 
example, Ay and /) are taken to be unknown and 
constant on the field. 

3. For a given combination of the cloud parameters, the 
likelihood function (|56l) is evaluated using the vari- 
ous observed magnitudes. Note that the evaluation of 
the expected star density a and of the star probabil- 
ity Prhijn) when the star is undetected in one or more 
bands is non-trivial and requires integrations over the 
completeness functions [cf. Eq. (p?2|l ]. 

4. The last step is repeated for different values of the 
cloud parameters and the ones corresponding to the 
minimum of the likelihood function are taken as best 
estimates. 

5. The local curvature of L (i.e., the matrix of its second 
derivatives) is used to estimate the errors on the cloud 
parameters. 

Clearly, one key point here is the speed of the likeli- 
hood function, which needs to be evaluated several times 
in our maximum-likelihood approach. In our implementa- 
tion, the likelihood function has been optimized by per- 
forming the relevant integrations (cf. point 3 above) using 
appropriate bounds. In other words, when an integration 
of Prfi(Tfi) was requested, we estimated the area in the 
magnitude space where this function was significantly dif- 
ferent from zero, and performed the integral inside that 
area (as opposed to performing the integral over the whole 
parameter space). This optimization was found to have a 
significant impact on the overall speed of our implemen- 
tation. 

4.3. Simulations 

The reliability and effectiveness of the maximum- 
likelihood approach were assessed through extensive nu- 
merical simulations. The simulations were designed to 
reproduce with reasonable accuracy the 2MASS near- 
infrared data. We simulated star observations in three 
bands, J, H, and K, and used the various parameters 
described in Table [3 

We initially considered an area of the sky A and a thin 
cloud characterized by a fraction of foreground stars / [cf. 
Eq. I|18|l ] and a reddening Ay. We randomly generated 
there stars inside this area using the following recipe: 




Fig. 7. Log-likelihood surface plot. The plot shows the 
logarithmic of the likelihood ratio as a function of the two 
unknown parameters Ay and / (the real values used for 
the simulation are 10 and 0.2 respectively). The simula- 
tion has been carried out using a A = 25, but the actual 
number of stars generated were N = 19 (because of the 
Poisson noise on the number of stars). On the bottom 
we also plot contours corresponding to 68.2%, 95.5%, and 
99.7% confidence level regions. Note that the likelihood is 
very smooth and has only a single well defined minimum. 



We evaluated the expected local star density a using 
Eqs. JjHJ an d EU- We found that the needed inte- 
grations could be performed more efficiently using a 
Monte-Carlo technique. 

We calculated the effective number of stars N by gen- 
erating a random integer distributed according to a 
Poisson distribution with average Act. 
For each of the N star we adopted the following pro- 
cedure: 

(a) We generated the unreddened magnitudes in the 
three bands according to Eq. IpHjl. 

(b) We then uniformly generated a random number in 
the range [0,1], and considered the star to be in 
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Fig. 8. The log-likelihood (ratio) as a function of Ay for 
different source densities, extremized for /. The largest 
curve is obtained using a A — 10 (in this particular re- 
alization we had N = 8); the mid curve using a A = 20 
(N = 12), and the most peaked curve using a A = 40 
(N = 35). In all cases the true value of the column den- 
sity was Ay = 10, and the fraction of foreground stars was 
/ = 0.2. Note that all curves are very well approximated 
by parabolas. The intersections of the log-likelihood func- 
tions with the three dashed lines mark the 68.2%, 95.5%, 
and 99.7% confidence level regions. 

the foreground if this number was smaller than / 
[defined according to Eq. I|18|l]. 

(c) Background stars were reddened by adding k\Ay 
to each magnitude; the magnitudes of foreground 
stars were left unchanged. 

(d) We added photometric errors to all stars; these, for 
simplicity, were taken to be Gaussian distributed 
with standard deviation v\ = 0.05 independent of 
the band and of the original magnitude. 

(e) For each band, we uniformly generated a random 
number in the range [0, 1], and took the star to be 
detected in the band if this number was smaller 
than the completeness function c\{rh\). In the 
simulations discussed here we used for simplicity 
Heaviside functions for the completeness functions. 

(f) We finally retained the star if it was detected in 
at least one band; otherwise, we repeated all steps 
above from point (a). 

In summary, at the end of a single star generation we had 
for each star the three magnitudes in the bands J, H, and 
K (with possibly some magnitudes rh\ = null) and the 
associated measurement errors v\. 

We then used this dataset to test the reliability and ef- 
ficiency of the maximum-likelihood estimator, and to com- 
pare it with the other column density estimators consid- 
ered in Sect. |3| The maximum- likelihood method was im- 
plemented as described in Sect. 14.21 and was tested against 
the data generated as described in the items above. 

FigureQshows the log-likelihood surface plot obtained 
in a typical simulation. The surface appears to be very 
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Fig. 9. The bias (Ay) — Ay of various column density 
measurement methods. The bias is evaluated from the av- 
erages of 1 000 simulated fields with no foreground contri- 
bution (/ = 0) and with average number of stars a A = 25. 
The maximum-likelihood method, marked in the legend as 
ML for brevity, has negligible bias, especially for Ay < 20. 

smooth with a well defined minimum, an essential con- 
dition for the reliability of the maximum-likelihood ap- 
proach. Moreover, the log-likelihood function is found to 
have approximately a parabolic shape, which further sim- 
plifies the interpretation of the results obtained. For exam- 
ple, this property allow s us to use the likelihood ratio (see, 
e.g.. lEadie et alJll97l|) to draw confidence level regions on 
the parameter space (see the contours of Fig. 0). 

Figure |H1 represents the log-likelihood as a function of 
Ay for three different datasets. The figure was obtained 
by minimizing the log-likelihood function with respect to 
/ for each value of Ay in the range [5, 15], and by plotting 
the value of this function. 

In order to assess more quantitatively the merits of 
the maximum-likelihood method, we compared the statis- 
tical properties of various column density estimators. In 
particular, we simulated a large number of "observations" 
using the technique described above, and we studied the 
distribution of the column densities estimated using the 
maximum-likelihood method, the Nice, and the Nicer 
methods. Simulations were carried out using a thin cloud 
with true extinction in the range Ay € [0, 30] and with 
different values of the foreground fraction /. For the Nice 
and Nicer estimators we used both the simple mean and 
the median of the individual extinction measured for each 
star; moreover, assuming that the density of foreground 
stars (jf could be determined separately, we discarded in 
each simulation the afA bluer stars, and used only the re- 
maining (redder) stars in the analysis. Note that in some 
cases, for large values of Ay and relatively large values of 
/ we had no usable star for the Nice and Nicer analysis; 
in other words, all stars left after the foreground selec- 
tion had only one band available (typically the K band). 
In this case we just obtained a lower limit on Ay by us- 
ing the redder usable star (even if this star was originally 
considered to be in the foreground). 
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umn density measurement methods, evaluated as in Fig.® 
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Fig. 11. As for Fig.® but with / = 0.02. 



Figures I§1 and ITUI show, respectively, the bias and the 
total error obtained from the three methods considered 
here for Ay £ [0, 30] and / = 0. From these plots it is 
evident that the maximum-likelihood estimator does not 
have any significant bias up to Ay = 20 and a very small 
one (of the order of 0.1) for larger column densities. Since 
for / = we never have foreground stars, the bias of the 
Nice and Nicer techniques does not change if we use a 
mean or a median estimator. Note also that the bias in 
Fig.® for these two methods is the one discussed in detail 
above (cf. Fig.®). Regarding the total error, we observe a 
steady increase of it for large values of Ay. This can be 
explained by noting that, although the average number 
of stars uA = 25 is kept constant for all column densi- 
ties, when Ay is large most stars are only detected in the 
K band and thus do not provide reddening information. 
FigureE3 a lso shows that the maximum- likelihood method 
is clearly superior, although Nicer (with the mean esti- 
mator) also performs well. As expected, both median es- 
timators are more noisy than the simple mean (which, for 
/ = 0, is optimal). 

The situation changes quite dramatically when / > 0. 
Figures 1171117)1 show the bias and the total error of the var- 
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Fig. 12. As for Fig. EH but with / = 0.02. 
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Fig. 13. As for Fig.® but with / = 0.05. 
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Fig. 14. As for Fig. E3 but with / = 0.05. 



ious methods for increasing values of /. A careful study of 
these results can reveal several interesting characteristics 
of the Nice and Nicer techniques. 

Let us initially focus on the bias plots, shown in 
Figs. ^JJ El an d El At low column densities, i.e. for 
Ay < 10, we find again the bias described in Sect. 13.21 
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Fig. 15. As for Fig.EI but with / = 0.10. 
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Fig. 16. As for Fig. EH but with / = 0.10. 



and plotted in Fig. (see also above Fig. for the case 
/ = 0). At median column densities, i.e. for Ay ~ 20 (or 
Ay ~ 30 for / = 0.02), the bias can be explained by the 
selection effect due to the correction of foreground stars. 
As explained above [see Eq. i|54fl] , removing the Nf = at A 
bluer stars introduces a systematic error on the extinc- 
tion estimate. This bias is positive (i.e. toward larger ex- 
tinctions) and can be as large as 1 mag in Ay. Finally, 
at high column densities (Ay ~ 30) both methods sys- 
tematically underestimate the extinction. This is due to 
the heavy contamination by foreground stars present at 
these large values of reddening. To better understand this 
point, we note that the subtraction of the Nf bluer stars 
is a simplistic approximation because the actual number 
of foreground stars is not fixed (it is a Poisson random 
variable with average Nf). Depending on the number of 
foreground stars with respect to Nf we can have three 
different situations: 

— If the number of foreground stars is exactly Nf, then 
at high column densities no bias is introduced and the 
estimate of Ay is correctly performed; 

— For a simulation with a number of foreground stars 
smaller than Nf, a small bias toward higher column 



densities is expected, because some of the bluer back- 
ground stars are discarded; 
— Finally, if the number of foreground stars is underesti- 
mated, a very large bias toward smaller column densi- 
ties is introduced. 

Clearly, for large values of Ay the third effect is expected 
to dominate. Indeed, Figs. 1131 and 1151 show that both the 
Nice and Nicer methods significantly underestimate the 
reddening for large values of Ay . In theory, as mentioned 
above, when applying the Nice or Nicer technique to 
high extinction regions, it should be relatively straightfor- 
ward to identify foreground stars by their color, and thus 
the bias of these method could be smaller than suggested 
by the plot shown here. Note that apparently Nicer 
presents a larger bias compared to Nice. This is due to 
the larger flexibility of the Nicer method, which is able 
to obtain a column density estimate when any two of the 
three bands are available. As a result, Nicer is more af- 
fected by the contamination by foreground stars described 
in the items above. Indeed, our simulations show that if 
we force Nicer to use only stars with observed H and K 
bands (i.e. essentially the same stars as the ones used by 
Nice), its bias and its noise are drastically reduced and 
become compatible with the ones of Nice. 

Figures IT2*1 1141 and 1161 show that the total error of the 
Nice and Nicer methods increases very rapidly at large 
column densities, where the contamination by foreground 
stars is very likely; the maximum-likelihood estimator, in- 
stead, has an almost flat error curve. Hence, our novel 
method is able to "recognize" the presence of foreground 
stars; moreover, the inclusion of the background density in 
the likelihood expression allows this estimator to "switch" 
to the number count technique in regions with large ex- 
tinction. Figure El m particularly, shows that even in ex- 
treme cases with a large foreground star contamination 
the maximum-likelihood method is still very reliable and 
accurate. To better appreciate this point, we note that, in 
our simulations, for Ay — 25 and / = 0.2 on average only 
one tenth of the ~ 25 stars are background to the cloud. 

4.4. Limitations 

The maximum-likelihood approach to the extinction mea- 
surements presents clear advantages with respect to the 
standard techniques in the simplified framework consid- 
ered in this section (uniform Ay over the cloud patch 
analysed, thin-cloud approximation, simple model for the 
star intrinsic magnitude distribution). One could thus le- 
gitimately ask whether the maximum- likelihood technique 
is applicable to more realistic and complex situations. 

4.4.1. Small-scale inhomogeneities 

Most clouds present clear sign of substructure at dif- 
ferent scales and a statistical analysis of the radio and 
NIR observations seems to indicate the that these ob- 
jects can be well described in terms of turbulent mod- 
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els (see, e.g. lMiesch fc Balhll994HPadoan et~allll997|) . In 
presence of significant inhomogeneities, most methods (in- 
cluding the maximum-likelihood one) are expected to be 
biased toward low values because the background stars 
will no longer be randomly distributed in the patch of 
the sky used to estimate the local extinction value, but 
will be preferentially detected in low-extinction regions 
[cf. Eq. Ipt3"|) ]. Although a detailed discussion of this effect 
is behind the scope of this paper, it is worth considering 
the following points: 

— For the HK Nice, for the JHK Nicer, and for the 
JHK maximum-likelihood methods, small scale inho- 
mogeneities become important when the local varia- 
tions of Ay on the patch of the sky considered are 
of order A = l/(afc 2 hilO), where = Ah /Ay 
[cf. Eq. ©]. For typical 2MASS observations we find 
A ~ 7.3 mag, and hence all methods should still be ap- 
plicable to the analysis of regions with relatively low 
column densities (approximately Ay < 10 mag). 

— The effect of subst ructures is stud ied in detail in a 
forthcoming paper llLombardilfcOol . where it is also 
presented a method to avoid the bias introduced by 
small-scale inhomogeneities. The application this novel 
technique to a Nicer map of the Pipe nebula con- 
firms the expectations summarized in the last item. 
In particular, for the Pipe nebula the "standard" 
Nicer has a negligible small bias (below 0.2 mag) for 
Ay < 10 mag, while the bias increases dramatically 
for larger extinction values (e.g., it reaches 1 mag at 
Ay — 15 mag). A similar bias behavior is expected 
for the maximum-likelihood technique described in this 
section. 

— Since small-scale inhomo geneities are believed to be 
due to turbulent motions L arsonl IT98lt IPadoan et alJ 



Il997t iHever fc Brund EffllF the probability distribu 
tion pA v {Ay) is expected to be a log- normal 
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(65) 



Independently from the exact form of pA v (Ay), the 
maximum-likelihood approach can be used in this more 
general framework to obtain the relevant parameters 
of pa v (e.g., M and S in the case of the log-normal 
of Eq. I|65|0 : moreover, the p arameters estimate d are 
asymptotically unbiased (see lEadie et alJ Il97lj) . We 
will consider the use of a non-trivial extinction proba- 
bility distributions similar to the one of Eq. l|o5fl in a 
follow-up paper. 

4.4.2. Star magnitude distribution 

In the implementation of the maximum-likelihood tech- 
nique discussed in this section, we used the simple model 
for the magnitude probability distribution pm{M) [cf. 
Eq. H34fl ]. However, the functional form of pm{M) used 
can be inaccurate in describing real data for several rea- 
sons: 



— Different stellar populations can produce multiple 
peaks in the color distribution of stars. For example, 
giant and dwarf stars produce two distinct peaks in 
the J — H histogram (the two peaks are also clearly 
visible in a JHK color-color diagram). 

— Different stellar populations can also have different 
slopes of the number counts, which could different sig- 
nificantly from the "no minal" value a ~ 0.34 (see, e.g. 
ICambresv et all r2002j) . If the various stellar popula- 
tions also have different intrinsic mean colors, then the 
simple model (|34[l could lead to inaccurate extinction 
measurements . 

— The average NIR colors of stars in regions free of ex- 
tinction are not completely independent of the star 
luminosity. For example, a color-magnitude plot of 
2MASS stars shows that the average J — K color 
slightly increases as the K magnitude increases. Since 
this effect is rather small, the associated bias is prob- 
ably negligible in most cases; moreover, this effect can 
be included in the expression Q35JI with a suitable 
choice of the coefficients P, Q, and R. 

All issues described above are strictly related to the sim- 
plified description for the probability pm(M) used here, 
and not to the maximum-likelihood method itself. In other 
words, it is possible (and relatively easy) to implement 
a maximum-likelihood estimator based on a more realis- 
tic probability distribution for the star magnitudes (e.g., 
synthetic stellar po pulation models; see lRobin et al.ll2004t 
Ijarrett et al.lll994j) . For this purpose, we note that the 
most computationally effective way to generalize pm(M) 
is by writing it as a linear combination of functions of 
the form 134(1 (with each function representing, de facto, 
a different star population). 

4.4.3. General remarks 

On of most significant drawback of the maximum- 
likelihood approach is its speed. The implementation used 
in this paper is approximately one order of magnitude 
slower than the Nicer method (at least on a typical work- 
station), and this might prevent large applications of the 
method proposed here. On the other hand, the fast tech- 
nological progress in the computer speed justifies the work 
presented in this paper, in the sense that soon it will be 
possible to use the maximum-likelihood method on large 
datasets composed of millions of stars. 

Another possible issue related to the technique pre- 
sented here is the need for a more detailed knowledge of 
the properties of the data used. As a comparison, we note 
that the original Nice technique makes use only of the 
H and K magnitudes of stars and of the average color 
(H — K) of stars in the control field. In addition to these 
data, Nicer also requires the estimated errors of star mag- 
nitudes in the NIR bands used. Finally, the maximum- 
likelihood method requires a detailed knowledge of the 
probability distribution pm{M), of the measurement er- 
rors p ex (rh\\m\) of each star, and of the completeness 
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functions C\(rh\). In reality, in the Bayesian approach im- 
plicitly adopted in this paper, the maximum-likelihood 
method can also be used if an approximate knowledge of 
these parameters is available. Suppose, for example, that 
the parameters P, Q, and R that characterize pu (M) [see 
Eq. l|35fl ] are only approximately known from the mea- 
surements in the control field. In this case we can take 
these parameters as unknowns in the expression for the 
likelihood, and we can thus minimize L with respect to 
them as well as with respect to the parameters that char- 
acterize pa v {Av) {Ay and / in the case considered here). 
As customary in standard maximum-likelihood problems, 
we include the knowledge on P, Q, and R as a prior in 
the function L. Note that this way the dimension of the 
space over which we need to minimize L greatly increases 
(and this can pose severe computational problems), but in 
principle the schema proposed is applicable to real cases. 

Interestingly, the Nice and Nicer technique can be 
seen as special cases of the maximum-likelihood method 
when no prior knowledge on R is available: the complete 
lack information on the normalization of pm(M) forces 
our method to use the only color information of the stars. 
Similarly, the number counts method can be recovered as 
special case when the knowledge on the average colors \a 
is absent [or, equivalently, when the scatter in the colors 
C a b is very large; cf. Eq. J23J]. This suggests that the 
Nice(r)) techniques are not more robust of the maximum- 
likelihood one, but just simpler. 

Finally, we mention that the presence of young stellar 
objects (that could be for example embedded in the dense 
cores) can in principle affect the extinction measurements. 
Hence, these objects should be "manually" removed before 
using any extinction measurement method, including the 
maximum-likelihood described in this paper. 

5. Discussion and conclusions 

In this paper we have considered the problem of an accu- 
rate determination of the extinction toward a dark molec- 
ular cloud. The results obtained here can be summarized 
in the following items: 

— The extinction and the reddening of background stars 
have been considered from a general statistical point 
of view. 

— The bias and uncertainties of the two main NIR tech- 
niques used to map the extinction, the star count and 
the color excess methods, have been discussed in de- 
tail. We have shown that, although the color excess 
method has generally a smaller error, it is affected by 
a large bias in presence of contamination by foreground 
stars. We have also shown that both Nice and Nicer 
arc affected by a relatively small bias (approximately 
0.2mag in Ay) as a result of selection effects. 

— A new optimal maximum-likelihood method has been 
presented and tested with extensive simulations. 

The simulations described in Sect. 14.31 have shown that 
in the simple case of a thin cloud the maximum-likelihood 



estimator performs significantly better than the Nice and 
Nicer estimators (since the number count method is 
known to have a larger noise, we did not report a de- 
tailed comparison with this method here). However, the 
maximum-likelihood approach also allows us to consider 
more general cloud configurations, which cannot be easily 
dealt with using standard techniques. For example, the 
maximum-likelihood techniques could be used to measure 
directly on the same patch of the sky both the column 
density Ay and the fraction of foreground stars /; alter- 
natively, it would be also possible to determine / globally 
on the cloud and take it as a constant on the whole field 
(a good approximation for small clouds). 

In Sect. 14. 4 Tl we discussed one of the most serious lim- 
itations of NIR color excess studies in molecular clouds, 
namely the bias introduced by substructures. In our orig- 
inal formulation, the maximum-likelihood method can 
be used not only in the thin cloud approximation, but 
in general for any functional form of pA v {Ay). Hence, 
as mentioned above, we could implement the maximum- 
likelihood method using a more realistic probability dis- 
tribution for Ay , such as the one of Eq. I|65|l . 

Another possibility offered by the maximum-likelihood 
approach is the generalization of the thin-cloud approxi- 
mation to a multi-layer case, where two or more (thin) 
clouds located at different distances are observed on over- 
lapping areas of the sky. For example, in case of a double 
cloud we could write [cf. Eq. i|17|) ] 

PaAAv) = f (1) S(A v ) + fP>6(A v - 4 X) ) 

+ {1 _ f w_ f m )s{Av _ A w_ A my 

(66) 

This configuration, for example, might be appropriate to 
study clouds close to the galactic center, where the su- 
perposition of different complexes is very likely. Such a 
method could effectively disentangle the effects of the two 
clouds provided the values of /W and f^ are sufficiently 
different. Alternatively, one could use a double cloud as a 
null test, i.e. to check that indeed the thin-cloud approx- 
imation is appropriate (in this case one expects to find 
A [ y ] ~ 0, AP ~ 0, or /W ~ 0). Some of these possibili- 
ties will be investigated in detail in a follow-up paper by 
using 2MASS data. 
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Appendix A: The median and related estimators 

The goal of this appendix is to derive the probability distribu- 
tion of the median of n independent identical random variables. 
This analysis is useful to address some of the issues discussed 
in Sect.ES 
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A.l. Notation 

In the following we will often deal with ordered and unordered 
n-tuples. The latter can be taken to be a set of n elements, 
where typically each element is a random variable or an element 
of another tuple; the former can be taken to be a list of n 
elements, where thus each element is associated to a position 
in the list. Hence, given a positive integer fc < n, it makes sense 
to talk about the fc-th element of an ordered n-tuple, while this 
is meaningless for an unordered tuple. 

We will denote an ordered n-tuple using brackets, as in 
[x\, . . . , x n ]; instead, we will reserve braces for the unordered 
tuples, as in {xi, . . . ,x n }. Note that, for consistency with the 
definition, we identify unordered tuples if these differ just by a 
permutation of the elements: thus {xi,X2,xs} is, for instance, 
identical to {X2, Xi, xz}. 

A.2. P£ and the median estimator 

Let us call p(x) the probability distribution for the real ran- 
dom variable x, and let us consider the generation of unordered 
n-tuples {x\, X2, x n } of such variables. Suppose that, af- 
ter the random generation of a tuple, we order the tuple such 
that x\ < X2 < ■ ■ ■ < i n . We consider now the probability 
distribution p k (x k ) for the fc-th element of the ordered tuple 
[xi, X2, ■ ■ ■ , x n ], which can be written as 

pU^) = nh k 2ljp(xk)[P(x k )] k - l [l-P(x k )] n - k , (A.l) 

where P(x) is the cumulative probability distribution of x: 

p(x')dx'. (A.2) 

- oc 

The expression appearing in Eq. JA.lll can be explained as 
follows. Let us consider an element (for example the first) of 
the unordered n-tuple. The probability that this element is the 
fc-th in the ordered tuple and that it has a value in the range 
[x, x + dx] is the product of three terms: 

— p(x) dx to takes into account the intrinsic probability dis- 
tribution of x; 

~ i^Zl) [P( x k)] k 1 [l — P(xk)] n fc , which is a simple binomial 
distribution giving the probability that the value chosen 
has fc — 1 elements of the order n-tuple at its left, and n — fc 
elements at its right. 

— Finally, we have to multiply the whole result by n in order 
to take into account the arbitrary choice of the fc-th element 
in the n-tuple. 

The cumulative distribution P k {xh) is given by 
pl(x' k )dx' k 

- OO 

(fc-l)l(n-fc)! Q > [ I ) 
x / [P(x k )] k ~ 1+i p(x' k )dx' k 

J — OO 

= EnrBtJl^r. (a.3) 

m=k \ / \ / 
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Fig. A.l. The cumulative probability distribution P^{xk) 
as a function of P{x) for n = 7 and various values of k 
(marked close to the relative curves). 
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Fig. A.2. The cumulative probability distribution P£{xk) 
as a function of P{x) for k £ {1,2,3,4,5,6,7} and 
n = 2k — 1. Note that all curves pass through the point 
(0.5,0.5). 

In the last step we changed the index variable into m = fc + 1. 
The final result obtained in Eq. l|A.3fl has the advantage to be 
a simple (polynomial) expression in P[x). 

Figures IA.1I and IA.2I show the polynomials P k ( x 1 as a 
function of P(x k ) for various values of n and fc. These figures 
suggest a number of properties for P k (xk) that can be verified 
analytically with the help of the equations written above: 

— P fe n (x) = if P(x) = 0, and P£(x) = 1 if P(x) = 1; 
moreover, P k is a monotonic increasing function of P. This 
implies that p k (x) vanishes where p(x) vanishes. 

— Using Eq. ijA.lfl . it is possible to show that [see below 
Eq. ESS) ] 

1 n 1 n 

-Y l Pk(x) = P(x), -^2pl{x)=p(x). (A.4) 
n z — ' n *■ — ' 

fe=l k=l 

— As suggested by Fig. lA.ll and by Eq. ijA.lfl . we have 

P fc "(x) = l-P" +1 _ fc (x'), (A.5) 

provided that P(x) = l-P(x'). This, in particular, implies 
that P k k ~ 1 (x) = 1/2 if P(x) = 1/2 (see Fig. 1X31. 
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Equation 1A.3II specialized to the cases k = 1 and k = n is distributions. Let us then evaluate the sum 



= [POri)]" , PZfrn) = 1 - [1 - P{x n )Y . 

(A.6) 

The two last properties have a close relationship with the me- 
dian estimator. We note, in fact, that for odd n the median 
estimator is in our notation xr n -\-V)/2', as a result, the probabil- 
ity distribution p k lk ~ 1 (xk) is just the probability distribution of 
the median for n = 2k — 1. Hence, the property (IA.5II written 
above can be rephrased as 

The median of the probability distribution of the me- 
dian estimator of the random variable x is the median 
of the probability distribution of x. 

Note that here we use the term "median" to denote both the 
usual median of an n-tuple {xi, . . . , x n }, and the median of a 
distribution, defined as the value x such that the value cumu- 
lative distribution is 1/2. 



A.3.1% andP^ k 

For our purposes, it is also useful to define and study two prob- 
ability distributions closely related to P k (x k ). Let us consider 
again the ordered n-tuple [x\, . . . , x n ], where each element is 
drawn from a probability distribution p(x). Now let us retain 
only the first k elements of the ordered n-tuple, and let us call 
P<k( x <k) the probability distribution for each element of the 
unordered fc-tuple {xi, . . . , x k }; similarly, we call p> k (x> k ) the 
probability distribution for the tuple {x k , ■ ■ ■ ,x n } where only 
the last (n — k+ 1) elements of the original ordered n-tuple are 
retained. 

The evaluation of the probability distribution p> k (x) can 
be broken into two parts. Consider the (k — l)-th element x k -i 
of the tuple [xi,. . . , x n ] (i.e., the last element discarded); then, 
clearly, each element of the (n — k + l)-tuple {x k , ■ ■ ■ , i,,} can- 
not be smaller than Xk-i- Moreover, each element of this un- 
ordered tuple is distributed between xt-i and oo according to 
the (truncated) original probability distribution p(x). Finally, 
by repeating this argument for each possible value of x k ~\ 
(weighted by Pu-i), we obtain the expression 



P>k(x) 



Pk-i(x k -l): 



p(x) 



P(x k - 



■ dxk- 



(A.7) 



Note that the term 1 — P(xk-i) is introduced here to correctly 
normalize the truncated probability distribution p(x). 

As usual in this appendix, it is more convenient to consider 
the cumulative probability distributions. We can thus integrate 
p>k( x ) over x and obtain, after a few manipulations, a closed 
expression for P> k (x). Here, however, we prefer to follow a 
different and simpler path. 

Let us consider again the tuple [xk, ■ ■ ■ ,x n ]. Each element 
of this ordered tuple follows the probability distribution p k (x k ) 
discussed in the previous section. Hence, if we consider the un- 
ordered tuple, the elements {xk, ■ ■ ■ , x„} will follow the average 
distribution 



P>k(x) 



k + 



y E Pkfc) 



(A.8) 



By integrating both sides of this equation on da;, we can verify 
that the same relation holds for the cumulative probability 



1 n 

k' = k 

= — \ — T ( n ) [p(x)] 

Tl-k + l ^ \m) 1 v ;J 

*£(-d 



■^k + m j m— 1 
.fe'-l 



(A.9) 



We now consider two cases. If k = 1, then the last sum can 
be taken to be the binomial expansion of (1 — l) m_1 , which 
vanishes for all integers m > 1. Hence we are left just with the 
case m = l, for which we obtain 



1 n 

P >i = - E P k 0*0 = P{x) 

— 77 * ' 



(A.10) 



This shows that each element of the unordered n-tuple 
{xi, . . . ,x n } follows the original distribution p(x), a very nat- 
ural result indeed. 

If, instead, k > 1, then using the properties of the binomial 
coefficient we find 



1 n 



, +k ( n\ fm-2 
m 



(A.ll) 



The similarity of this result with the last line of Eq. l|A.3|l is 
rather surprising. 

Similarly, we wish to investigate the cumulative probability 
distribution P< k (x) for the fe-tuple {xi, . . . , Xk}. This quantity 
is better evaluated using P" k (x): 



■k'=l k'=k+l 

nP(x) -(n- k)P> k+1 (x) 



" 1 (-l) m+k 
k w k ^r> \mj V k- 1 J K ' 

m — fc + l 



n _., . 1 v-^ I n \ ( m — 2 
-P(x) + r > 

k. ' k ^ \ m 

[P(x)] 



(A.12) 



As an application of the results obtained in this section, 
we evaluate the bias introduced in the estimate of the column 
density when using the technique described in Sect. 13.21 to re- 
move foreground stars [see Eq. 1541 1 . Suppose that we observe 
stars in a region with no significant extinction, so that both 
foreground and background stars have the same distribution 
in colors. For simplicity, we assume that the reddening esti- 
mates Av for each individual star is a Gaussian distribution 
with vanishing average and variance a^ v . In this situation, if 
we exclude the k bluer stars (because they are taken to be 
foreground), we will bias the estimate of Av toward large col- 
umn densities. In particular, the column densities for the stars 
left will be distributed according to p> k +i- The median of this 
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distribution can be evaluated as follows. At Av 
P{A V ) = 1/2, while 



P>k+i(0) 



we have 



(A.13) 



Assuming n ^> k, then each term in the sum above is approx- 
imately unity (cf. above Fig. IA.H . Hence we obtain 



P> k+ i(0) 



2k 



2n - 2k 



(A.14) 



The fact that this quantity is not exactly 1/2 is telling us that 
there is a bias. The exact amount of this bias can be evalu- 
ated by solving the equation P> fe+1 (:r) = 1/2. Using Newton's 
method we obtain then the approximate solution 





-l 


-i 


1 


dx 




x=0 


2 



>fc+l 



(0) 



k 



2np(0) 



(A.15) 



Note that in this equation we used the relationship between 
cumulative and differential probability distributions; moreover, 
we simplified pj(0) ~ at low k, and retained only terms linear 
in k/n. The final result obtained is thus given by Eq. 1541 . 
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